Overview

Dataset statistics

Number of variables10
Number of observations583
Missing cells4
Missing cells (%)0.1%
Duplicate rows13
Duplicate rows (%)2.2%
Total size in memory75.1 KiB
Average record size in memory131.9 B

Variable types

NUM9
BOOL1

Reproduction

Analysis started2020-07-03 13:09:19.471133
Analysis finished2020-07-03 13:09:30.964243
Duration11.49 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 13 (2.2%) duplicate rows Duplicates

Variables

Age
Real number (ℝ≥0)

Distinct count72
Unique (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.74614065180103
Minimum4
Maximum90
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum4
5-th percentile18
Q133
median45
Q358
95-th percentile72
Maximum90
Range86
Interquartile range (IQR)25

Descriptive statistics

Standard deviation16.1898333
Coefficient of variation (CV)0.3618151883
Kurtosis-0.5600656409
Mean44.74614065
Median Absolute Deviation (MAD)12
Skewness-0.02938531271
Sum26087
Variance262.1107024
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60345.8%
 
45254.3%
 
50233.9%
 
42213.6%
 
38213.6%
 
32203.4%
 
48203.4%
 
55183.1%
 
65172.9%
 
40172.9%
 
46162.7%
 
33152.6%
 
58142.4%
 
75142.4%
 
26142.4%
 
66122.1%
 
35122.1%
 
18111.9%
 
49111.9%
 
36111.9%
 
51101.7%
 
30101.7%
 
7091.5%
 
6291.5%
 
3791.5%
 
Other values (47)19032.6%
 
ValueCountFrequency (%) 
420.3%
 
610.2%
 
720.3%
 
810.2%
 
1010.2%
 
1110.2%
 
1220.3%
 
1340.7%
 
1420.3%
 
1510.2%
 
ValueCountFrequency (%) 
9010.2%
 
8510.2%
 
8410.2%
 
7810.2%
 
75142.4%
 
7440.7%
 
7320.3%
 
7281.4%
 
7091.5%
 
6920.3%
 

Total_Bilirubin
Real number (ℝ≥0)

Distinct count113
Unique (%)19.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.298799313893653
Minimum0.4
Maximum75.0
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum0.4
5-th percentile0.6
Q10.8
median1
Q32.6
95-th percentile16.35
Maximum75
Range74.6
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation6.209521726
Coefficient of variation (CV)1.882358136
Kurtosis37.16379152
Mean3.298799314
Median Absolute Deviation (MAD)0.3
Skewness4.907473994
Sum1923.2
Variance38.55816007
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.89115.6%
 
0.77713.2%
 
0.9579.8%
 
0.6467.9%
 
1284.8%
 
1.1193.3%
 
1.8142.4%
 
1.4132.2%
 
1.3122.1%
 
1.7111.9%
 
2.791.5%
 
281.4%
 
1.281.4%
 
1.981.4%
 
1.681.4%
 
2.281.4%
 
2.961.0%
 
2.650.9%
 
0.550.9%
 
1.550.9%
 
5.850.9%
 
2.450.9%
 
2.840.7%
 
3.940.7%
 
2.140.7%
 
Other values (88)12321.1%
 
ValueCountFrequency (%) 
0.410.2%
 
0.550.9%
 
0.6467.9%
 
0.77713.2%
 
0.89115.6%
 
0.9579.8%
 
1284.8%
 
1.1193.3%
 
1.281.4%
 
1.3122.1%
 
ValueCountFrequency (%) 
7510.2%
 
42.810.2%
 
32.610.2%
 
30.810.2%
 
30.520.3%
 
27.710.2%
 
27.210.2%
 
26.310.2%
 
2510.2%
 
23.310.2%
 

Direct_Bilirubin
Real number (ℝ≥0)

Distinct count80
Unique (%)13.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.486106346483705
Minimum0.1
Maximum19.7
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum0.1
5-th percentile0.1
Q10.2
median0.3
Q31.3
95-th percentile8.4
Maximum19.7
Range19.6
Interquartile range (IQR)1.1

Descriptive statistics

Standard deviation2.808497618
Coefficient of variation (CV)1.889836232
Kurtosis11.35252876
Mean1.486106346
Median Absolute Deviation (MAD)0.2
Skewness3.212402862
Sum866.4
Variance7.887658868
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.219433.3%
 
0.16310.8%
 
0.3518.7%
 
0.8223.8%
 
0.4213.6%
 
0.5203.4%
 
0.6162.7%
 
1132.2%
 
1.3122.1%
 
0.7111.9%
 
1.6111.9%
 
1.2101.7%
 
1.171.2%
 
0.971.2%
 
1.471.2%
 
3.261.0%
 
1.561.0%
 
350.9%
 
2.350.9%
 
3.640.7%
 
2.140.7%
 
230.5%
 
430.5%
 
2.730.5%
 
2.530.5%
 
Other values (55)7613.0%
 
ValueCountFrequency (%) 
0.16310.8%
 
0.219433.3%
 
0.3518.7%
 
0.4213.6%
 
0.5203.4%
 
0.6162.7%
 
0.7111.9%
 
0.8223.8%
 
0.971.2%
 
1132.2%
 
ValueCountFrequency (%) 
19.710.2%
 
18.310.2%
 
17.110.2%
 
14.210.2%
 
14.110.2%
 
13.710.2%
 
12.810.2%
 
12.620.3%
 
12.110.2%
 
11.820.3%
 

Alkaline_Phosphotase
Real number (ℝ≥0)

Distinct count263
Unique (%)45.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean290.57632933104634
Minimum63
Maximum2110
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum63
5-th percentile137
Q1175.5
median208
Q3298
95-th percentile698.1
Maximum2110
Range2047
Interquartile range (IQR)122.5

Descriptive statistics

Standard deviation242.9379892
Coefficient of variation (CV)0.8360556751
Kurtosis17.75282846
Mean290.5763293
Median Absolute Deviation (MAD)50
Skewness3.765106397
Sum169406
Variance59018.86659
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
298111.9%
 
198111.9%
 
215111.9%
 
195101.7%
 
180101.7%
 
190101.7%
 
18291.5%
 
14591.5%
 
15891.5%
 
16581.4%
 
21881.4%
 
28281.4%
 
19671.2%
 
18871.2%
 
20271.2%
 
16861.0%
 
19261.0%
 
17561.0%
 
16261.0%
 
31061.0%
 
23061.0%
 
20661.0%
 
20561.0%
 
29061.0%
 
18961.0%
 
Other values (238)38866.6%
 
ValueCountFrequency (%) 
6310.2%
 
7510.2%
 
9010.2%
 
9220.3%
 
9710.2%
 
9810.2%
 
10020.3%
 
10210.2%
 
10310.2%
 
10510.2%
 
ValueCountFrequency (%) 
211010.2%
 
189610.2%
 
175010.2%
 
163010.2%
 
162010.2%
 
158010.2%
 
155010.2%
 
142010.2%
 
135020.3%
 
112410.2%
 

Alamine_Aminotransferase
Real number (ℝ≥0)

Distinct count152
Unique (%)26.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean80.71355060034305
Minimum10
Maximum2000
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum10
5-th percentile15
Q123
median35
Q360.5
95-th percentile232
Maximum2000
Range1990
Interquartile range (IQR)37.5

Descriptive statistics

Standard deviation182.620356
Coefficient of variation (CV)2.26257369
Kurtosis50.57944964
Mean80.7135506
Median Absolute Deviation (MAD)15
Skewness6.549191929
Sum47056
Variance33350.19444
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25254.3%
 
20233.9%
 
22183.1%
 
18172.9%
 
28172.9%
 
21172.9%
 
30152.6%
 
15142.4%
 
48142.4%
 
24132.2%
 
32122.1%
 
31122.1%
 
29122.1%
 
36111.9%
 
50101.7%
 
12101.7%
 
33101.7%
 
26101.7%
 
2791.5%
 
4291.5%
 
2391.5%
 
3591.5%
 
3791.5%
 
4091.5%
 
3881.4%
 
Other values (127)26144.8%
 
ValueCountFrequency (%) 
1040.7%
 
1120.3%
 
12101.7%
 
1340.7%
 
1481.4%
 
15142.4%
 
1681.4%
 
1781.4%
 
18172.9%
 
1961.0%
 
ValueCountFrequency (%) 
200010.2%
 
168010.2%
 
163010.2%
 
135010.2%
 
125020.3%
 
95010.2%
 
87520.3%
 
79010.2%
 
77910.2%
 
62210.2%
 

Aspartate_Aminotransferase
Real number (ℝ≥0)

Distinct count177
Unique (%)30.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109.91080617495712
Minimum10
Maximum4929
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum10
5-th percentile15.1
Q125
median42
Q387
95-th percentile400.9
Maximum4929
Range4919
Interquartile range (IQR)62

Descriptive statistics

Standard deviation288.9185291
Coefficient of variation (CV)2.628663542
Kurtosis150.9198836
Mean109.9108062
Median Absolute Deviation (MAD)21
Skewness10.54617722
Sum64078
Variance83473.91643
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
23162.7%
 
20142.4%
 
21142.4%
 
30142.4%
 
28132.2%
 
25132.2%
 
22132.2%
 
32122.1%
 
24122.1%
 
34122.1%
 
40111.9%
 
19111.9%
 
15111.9%
 
29111.9%
 
1891.5%
 
4291.5%
 
1691.5%
 
2691.5%
 
5891.5%
 
3181.4%
 
1781.4%
 
3581.4%
 
1481.4%
 
2781.4%
 
3371.2%
 
Other values (152)31453.9%
 
ValueCountFrequency (%) 
1010.2%
 
1120.3%
 
1250.9%
 
1330.5%
 
1481.4%
 
15111.9%
 
1691.5%
 
1781.4%
 
1891.5%
 
19111.9%
 
ValueCountFrequency (%) 
492910.2%
 
294610.2%
 
160010.2%
 
150010.2%
 
105020.3%
 
96010.2%
 
95010.2%
 
85040.7%
 
84410.2%
 
79410.2%
 

Total_Protiens
Real number (ℝ≥0)

Distinct count58
Unique (%)9.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.483190394511149
Minimum2.7
Maximum9.6
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum2.7
5-th percentile4.61
Q15.8
median6.6
Q37.2
95-th percentile8.1
Maximum9.6
Range6.9
Interquartile range (IQR)1.4

Descriptive statistics

Standard deviation1.085451484
Coefficient of variation (CV)0.167425514
Kurtosis0.2330385856
Mean6.483190395
Median Absolute Deviation (MAD)0.7
Skewness-0.2856721864
Sum3779.7
Variance1.178204924
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7325.5%
 
6305.1%
 
6.8284.8%
 
6.9254.3%
 
6.2244.1%
 
7.1223.8%
 
7.2213.6%
 
8203.4%
 
6.4183.1%
 
7.3183.1%
 
6.1183.1%
 
5.6183.1%
 
5.5172.9%
 
6.6162.7%
 
6.7152.6%
 
6.5152.6%
 
7.5152.6%
 
7.9142.4%
 
5.8142.4%
 
5.9142.4%
 
6.3142.4%
 
5.4132.2%
 
7.4122.1%
 
5.2122.1%
 
5.7111.9%
 
Other values (33)12721.8%
 
ValueCountFrequency (%) 
2.710.2%
 
2.810.2%
 
310.2%
 
3.630.5%
 
3.710.2%
 
3.820.3%
 
3.920.3%
 
420.3%
 
4.120.3%
 
4.330.5%
 
ValueCountFrequency (%) 
9.610.2%
 
9.510.2%
 
9.220.3%
 
8.910.2%
 
8.710.2%
 
8.630.5%
 
8.550.9%
 
8.430.5%
 
8.330.5%
 
8.281.4%
 

Albumin
Real number (ℝ≥0)

Distinct count40
Unique (%)6.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.141852487135506
Minimum0.9
Maximum5.5
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum0.9
5-th percentile1.8
Q12.6
median3.1
Q33.8
95-th percentile4.39
Maximum5.5
Range4.6
Interquartile range (IQR)1.2

Descriptive statistics

Standard deviation0.795518806
Coefficient of variation (CV)0.253200559
Kurtosis-0.3879048072
Mean3.141852487
Median Absolute Deviation (MAD)0.6
Skewness-0.04368472855
Sum1831.7
Variance0.6328501706
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3457.7%
 
4376.3%
 
2.9295.0%
 
3.1284.8%
 
3.2264.5%
 
3.9254.3%
 
2.7244.1%
 
2.5244.1%
 
3.5233.9%
 
3.4213.6%
 
2213.6%
 
2.6213.6%
 
3.3213.6%
 
3.7213.6%
 
2.8183.1%
 
3.6183.1%
 
2.4172.9%
 
4.1162.7%
 
3.8152.6%
 
4.3142.4%
 
2.1142.4%
 
2.3122.1%
 
4.2122.1%
 
1.8122.1%
 
2.2122.1%
 
Other values (15)579.8%
 
ValueCountFrequency (%) 
0.920.3%
 
110.2%
 
1.430.5%
 
1.530.5%
 
1.681.4%
 
1.730.5%
 
1.8122.1%
 
1.971.2%
 
2213.6%
 
2.1142.4%
 
ValueCountFrequency (%) 
5.520.3%
 
510.2%
 
4.940.7%
 
4.820.3%
 
4.730.5%
 
4.640.7%
 
4.561.0%
 
4.481.4%
 
4.3142.4%
 
4.2122.1%
 

Albumin_and_Globulin_Ratio
Real number (ℝ≥0)

Distinct count69
Unique (%)11.9%
Missing4
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean0.9470639032815199
Minimum0.3
Maximum2.8
Zeros0
Zeros (%)0.0%
Memory size4.7 KiB

Quantile statistics

Minimum0.3
5-th percentile0.5
Q10.7
median0.93
Q31.1
95-th percentile1.5
Maximum2.8
Range2.5
Interquartile range (IQR)0.4

Descriptive statistics

Standard deviation0.3195921077
Coefficient of variation (CV)0.337455695
Kurtosis3.281899825
Mean0.9470639033
Median Absolute Deviation (MAD)0.17
Skewness0.992299448
Sum548.35
Variance0.1021391153
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
110618.2%
 
0.86511.1%
 
0.95910.1%
 
0.7539.1%
 
1.1467.9%
 
1.2356.0%
 
0.6315.3%
 
0.5295.0%
 
1.3254.3%
 
1.4172.9%
 
0.4142.4%
 
1.5101.7%
 
1.650.9%
 
1.740.7%
 
0.340.7%
 
0.7540.7%
 
0.9630.5%
 
1.3830.5%
 
1.830.5%
 
0.4720.3%
 
0.9220.3%
 
1.1620.3%
 
0.5220.3%
 
0.7620.3%
 
0.9320.3%
 
Other values (44)518.7%
 
(Missing)40.7%
 
ValueCountFrequency (%) 
0.340.7%
 
0.3510.2%
 
0.3710.2%
 
0.3910.2%
 
0.4142.4%
 
0.4510.2%
 
0.4610.2%
 
0.4720.3%
 
0.4810.2%
 
0.5295.0%
 
ValueCountFrequency (%) 
2.810.2%
 
2.520.3%
 
1.910.2%
 
1.8520.3%
 
1.830.5%
 
1.7210.2%
 
1.740.7%
 
1.6610.2%
 
1.650.9%
 
1.5820.3%
 

Dataset
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.7 KiB
Yes
416
No
167
ValueCountFrequency (%) 
Yes41671.4%
 
No16728.6%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

AgeTotal_BilirubinDirect_BilirubinAlkaline_PhosphotaseAlamine_AminotransferaseAspartate_AminotransferaseTotal_ProtiensAlbuminAlbumin_and_Globulin_RatioDataset
0650.70.118716186.83.30.90Yes
16210.95.5699641007.53.20.74Yes
2627.34.149060687.03.30.89Yes
3581.00.418214206.83.41.00Yes
4723.92.019527597.32.40.40Yes
5461.80.720819147.64.41.30Yes
6260.90.215416127.03.51.00Yes
7290.90.320214116.73.61.10Yes
8170.90.320222197.44.11.20No
9550.70.229053586.83.41.00Yes

Last rows

AgeTotal_BilirubinDirect_BilirubinAlkaline_PhosphotaseAlamine_AminotransferaseAspartate_AminotransferaseTotal_ProtiensAlbuminAlbumin_and_Globulin_RatioDataset
573323.71.661250886.21.90.40Yes
5743212.16.051548926.62.40.50Yes
5753225.013.756041887.92.52.50Yes
5763215.08.228958805.32.20.70Yes
5773212.78.419028475.42.60.90Yes
578600.50.150020345.91.60.37No
579400.60.19835316.03.21.10Yes
580520.80.224548496.43.21.00Yes
581311.30.518429326.83.41.00Yes
582381.00.321621247.34.41.50No

Duplicate rows

Most frequent

AgeTotal_BilirubinDirect_BilirubinAlkaline_PhosphotaseAlamine_AminotransferaseAspartate_AminotransferaseTotal_ProtiensAlbuminAlbumin_and_Globulin_RatioDatasetcount
0180.80.2282721405.52.50.80Yes2
1301.60.4332841395.62.70.90Yes2
2310.60.117548346.03.71.60Yes2
3344.12.02898757315.02.71.10Yes2
4360.80.215829396.02.20.50No2
5365.32.314532925.12.61.00No2
6382.61.241059575.63.00.80No2
7391.90.918042627.44.31.38Yes2
8400.90.32932322456.83.10.80Yes2
9428.94.527231615.82.00.50Yes2